Search results

1 – 10 of 945
Article
Publication date: 20 December 2017

Arash Joorabchi and Abdulhussain E. Mahdi

Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of…

1141

Abstract

Purpose

Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources. To this end, the purpose of this paper is to describe the design and development of a software system for automatic mapping of FAST subject headings, used to index library materials, to their corresponding articles in Wikipedia.

Design/methodology/approach

The proposed system works by first detecting all the candidate Wikipedia concepts (articles) occurring in the titles of the books and other library materials which are indexed with a given FAST subject heading. This is then followed by training and deploying a machine learning (ML) algorithm designed to automatically identify those concepts that correspond to the FAST heading. In specific, the ML algorithm used is a binary classifier which classifies the candidate concepts into either “corresponding” or “non-corresponding” categories. The classifier is trained to learn the characteristics of those candidates which have the highest probability of belonging to the “corresponding” category based on a set of 14 positional, statistical, and semantic features.

Findings

The authors have assessed the performance of the developed system using standard information retrieval measures of precision, recall, and F-score on a data set containing 170 FAST subject headings manually mapped to their corresponding Wikipedia articles. The evaluation results show that the developed system is capable of achieving F-scores as high as 0.65 and 0.99 in the corresponding and non-corresponding categories, respectively.

Research limitations/implications

The size of the data set used to evaluate the performance of the system is rather small. However, the authors believe that the developed data set is large enough to demonstrate the feasibility and scalability of the proposed approach.

Practical implications

The sheer size of English Wikipedia makes the manual mapping of Wikipedia articles to library subject headings a very labor-intensive and time-consuming task. Therefore, the aim is to reduce the cost of such mapping and integration.

Social implications

The proposed mapping paves the way for connecting libraries and Wikipedia as two major silos of knowledge, and enables the bi-directional movement of users between the two.

Originality/value

To the best of the authors’ knowledge, the current work is the first attempt at automatic mapping of Wikipedia to a library-controlled vocabulary.

Details

Library Hi Tech, vol. 36 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 April 1995

R.L. Stoll, A.E. Mahdi and J.K. Sykulski

Ceramic superconductors experience losses when carrying alternating currents. A first step in an attempt to macroscopically model the loss mechanism is to consider the ac…

Abstract

Ceramic superconductors experience losses when carrying alternating currents. A first step in an attempt to macroscopically model the loss mechanism is to consider the ac transport current in a ribbon that has a cross‐section of width much greater than thickness. To some extent high‐temperature superconductors behave in a way similar to type II superconductors in which the loss mechanism is described by the critical state model, where the current is assumed to flow with a constant critical density Jc and is independent of the magnetic flux density B and ∂B/∂t. The dominant mechanism is the irreversible motion of fluxoids due to their interaction with the pinning sites, resulting in a form of hysteretic loss that can be represented in macroscopic terms (in a system with only one component of magnetic field) as proportional to ∫HsdBa/T over a complete cycle of period T, where Hs is the surface magnetic field strength and Ba is the space average value of flux density. However, it is found that the high‐temperature materials exhibit strong flux creep effects, and so the critical state model may not provide a sufficient description. To find an alternative formulation it is necessary to consider the flux creep E‐J characteristic of the ceramic material. If a highly nonlinear expression for the resistivity ? can be found, it may be possible to model the flux and current behaviour as a diffusion process.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 14 no. 4
Type: Research Article
ISSN: 0332-1649

Article
Publication date: 18 November 2013

Arash Joorabchi and Abdulhussain E. Mahdi

This paper aims to report on the design and development of a new approach for automatic classification and subject indexing of research documents in scientific digital libraries…

1730

Abstract

Purpose

This paper aims to report on the design and development of a new approach for automatic classification and subject indexing of research documents in scientific digital libraries and repositories (DLR) according to library controlled vocabularies such as DDC and FAST.

Design/methodology/approach

The proposed concept matching-based approach (CMA) detects key Wikipedia concepts occurring in a document and searches the OPACs of conventional libraries via querying the WorldCat database to retrieve a set of MARC records which share one or more of the detected key concepts. Then the semantic similarity of each retrieved MARC record to the document is measured and, using an inference algorithm, the DDC classes and FAST subjects of those MARC records which have the highest similarity to the document are assigned to it.

Findings

The performance of the proposed method in terms of the accuracy of the DDC classes and FAST subjects automatically assigned to a set of research documents is evaluated using standard information retrieval measures of precision, recall, and F1. The authors demonstrate the superiority of the proposed approach in terms of accuracy performance in comparison to a similar system currently deployed in a large scale scientific search engine.

Originality/value

The proposed approach enables the development of a new type of subject classification system for DLR, and addresses some of the problems similar systems suffer from, such as the problem of imbalanced training data encountered by machine learning-based systems, and the problem of word-sense ambiguity encountered by string matching-based systems.

Article
Publication date: 1 June 1999

J.K. Sykulski, M. Rotaru and R.L. Stoll

The paper presents an extension to previous work on modelling AC losses in high‐temperature superconducting tapes as a highly non‐linear diffusion process. Following successful…

Abstract

The paper presents an extension to previous work on modelling AC losses in high‐temperature superconducting tapes as a highly non‐linear diffusion process. Following successful formulation for a bulk superconductor the presence of silver in a tape has now been included, using a “sandwich” model, to represent more realistically the practical arrangement. The results of the extended 1‐D model are included and a new 2‐D scheme is described using finite difference formulation. Effects of non‐linearity are emphasised.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 18 no. 2
Type: Research Article
ISSN: 0332-1649

Keywords

Article
Publication date: 7 August 2009

Arash Joorabchi and Abdulhussain E. Mahdi

With the significant growth in electronic education materials such as syllabus documents and lecture notes, available on the internet and intranets, there is a need for robust…

1547

Abstract

Purpose

With the significant growth in electronic education materials such as syllabus documents and lecture notes, available on the internet and intranets, there is a need for robust central repositories of such materials to allow both educators and learners to conveniently share, search and access them. The purpose of this paper is to report on the work to develop a national repository for course syllabi in Ireland.

Design/methodology/approach

The paper describes a prototype syllabus repository system for higher education in Ireland, which has been developed by utilising a number of information extraction and document classification techniques, including a new fully unsupervised document classification method that uses a web search engine for automatic collection of training set for the classification algorithm.

Findings

Preliminary experimental results for evaluating the performance of the system and its various units, particularly the information extractor and the classifier, are presented and discussed.

Originality/value

In this paper, three major obstacles associated with creating a large‐scale syllabus repository are identified, and a comprehensive review of published research work related to addressing these problems is provided. Two different types of syllabus documents are identified and describe a rule‐based information extraction system capable of extracting structured information from unstructured syllabus documents is described. Finally, the importance of classifying resources in a syllabus digital library is highlighted, a number of standard education classification schemes are introduced, and the unsupervised automated document classification system, which classifies syllabus documents based on an extended version of the International Standard Classification of Education, is described.

Details

The Electronic Library, vol. 27 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 March 2006

Abdulhussain E. Mahdi

This paper seeks to propose a new non‐intrusive method for the assessment of speech quality of voice communication systems and evaluate its performance.

Abstract

Purpose

This paper seeks to propose a new non‐intrusive method for the assessment of speech quality of voice communication systems and evaluate its performance.

Design/methodology/approach

The method is based on measuring perception‐based objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre‐formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into equivalent subjective mean opinion scores (MOSs). The required clustering and matching processes are achieved by an efficient data‐mining tool known as the self‐organizing map (SOM). The proposed method was examined using a wide range of distortion including speech compression, wireless channel impairments, VoIP channel impairments, and modifications to the signal from features such as AGC.

Findings

The experimental results reported indicate that the proposed method provides a high level of accuracy in predicting the actual subjective quality of the speech. Specifically, the second version of the method, which is based on the use of bark spectrum (BS) analysis, is more accurate in predicting the MOS scores compared with its first and third versions (which are based on BS analysis and mel frequency cepstrum coefficients (MFCC), respectively), and outperforms the ITU‐T PESQ in a large number of test cases, particularly those related to distortion caused by channel impairments and signal level modifications.

Research limitations/implications

It is believed that the prototype developed of the proposed objective speech quality measure is sufficiently accurate and robust against speaker, utterance and distortion type variations. Nevertheless, there are still possible directions for further improvements and enhancement. In general there are three areas that could be pursued for further improvements: widening the coverage of speaker variations of the system's codebook; formulating and using a perceptual speech model that provides true speaker‐independent representation of speech; and implementing the proposed measure as a stand‐alone system, preferably for real‐time applications.

Practical implications

Being an output‐based method, the proposed method can be employed for monitoring and assessing telecommunications networks under both live traffic conditions and off‐line evaluation.

Originality/value

The main contribution of this paper is the introduction of a new output‐based, non‐intrusive method for the assessment of speech quality that is sufficiently accurate and robust. To the best of the author's knowledge, no reliable output‐based objective speech quality assessment method has to date been reported or formally recognised.

Details

Journal of Enterprise Information Management, vol. 19 no. 2
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 7 March 2016

Arash Joorabchi, Michael English and Abdulhussain E. Mahdi

The use of social media and in particular community Question Answering (Q & A) websites by learners has increased significantly in recent years. The vast amounts of data…

1546

Abstract

Purpose

The use of social media and in particular community Question Answering (Q & A) websites by learners has increased significantly in recent years. The vast amounts of data posted on these sites provide an opportunity to investigate the topics under discussion and those receiving most attention. The purpose of this paper is to automatically analyse the content of a popular computer programming Q & A website, StackOverflow (SO), determine the exact topics of posted Q & As, and narrow down their categories to help determine subject difficulties of learners. By doing so, the authors have been able to rank identified topics and categories according to their frequencies, and therefore, mark the most asked about subjects and, hence, identify the most difficult and challenging topics commonly faced by learners of computer programming and software development.

Design/methodology/approach

In this work the authors have adopted a heuristic research approach combined with a text mining approach to investigate the topics and categories of Q & A posts on the SO website. Almost 186,000 Q & A posts were analysed and their categories refined using Wikipedia as a crowd-sourced classification system. After identifying and counting the occurrence frequency of all the topics and categories, their semantic relationships were established. This data were then presented as a rich graph which could be visualized using graph visualization software such as Gephi.

Findings

Reported results and corresponding discussion has given an indication that the insight gained from the process can be further refined and potentially used by instructors, teachers, and educators to pay more attention to and focus on the commonly occurring topics/subjects when designing their course material, delivery, and teaching methods.

Research limitations/implications

The proposed approach limits the scope of the analysis to a subset of Q & As which contain one or more links to Wikipedia. Therefore, developing more sophisticated text mining methods capable of analysing a larger portion of available data would improve the accuracy and generalizability of the results.

Originality/value

The application of text mining and data analytics technologies in education has created a new interdisciplinary field of research between the education and information sciences, called Educational Data Mining (EDM). The work presented in this paper falls under this field of research; and it is an early attempt at investigating the practical applications of text mining technologies in the area of computer science (CS) education.

Details

Journal of Enterprise Information Management, vol. 29 no. 2
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 1 June 2015

Liang-Chu Chen, Ting-Jung Yu and Chi-Li Chang

This paper aims to collect the terminologies from the Ministry of National Defense military dictionary and to design a military-based wiki system, TMTpedia, to serve as a

Abstract

Purpose

This paper aims to collect the terminologies from the Ministry of National Defense military dictionary and to design a military-based wiki system, TMTpedia, to serve as a collaborative and sharing platform for military personnel.

Design/methodology/approach

The development of the system is based on a prototype design and case illustration. The framework of the Taiwan Military Terminology Wikipedia system (TMTpedia) consists of three major subsystems, namely, Military Terminology Dictionary Processing, Military Article Contents Extension and Military Article and Resource Recommendation. This paper applies the engines of MediaWiki to design the proposed TMTpedia, and embedded different functions into a variety of system modules are developed by using such tools as C#, Java and SQL Server.

Findings

In this demonstration, the focus is on the topics of “Communications, Electronics and Information Operations” that are illustrative of cases that reveal the results of the TMTpedia system.

Originality/value

The main contributions of this paper are to transform military terminologies from a traditional dictionary into Wiki-based platform that can provide a reference framework for knowledge collaboration, to extend the content on the TMTpedia system from an external knowledge encyclopedia to an extensible mechanism that can renew military concepts for the accuracy of knowledge sharing and transformation and to implement a recommendation model into the TMTpedia system that dynamically provides relevant military information from external resources to enhance the effectiveness of knowledge acquisition.

Details

The Electronic Library, vol. 33 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Open Access
Article
Publication date: 2 April 2024

Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an

Abstract

Purpose

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.

Design/methodology/approach

On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.

Findings

The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.

Originality/value

The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.

Details

Journal of Documentation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 9 April 2018

Stuti Saxena

The purpose of this study is to underscore the significance, drivers and barriers towards re-use of open data sets in the context of Oman’s open government data (OGD) initiative.

Abstract

Purpose

The purpose of this study is to underscore the significance, drivers and barriers towards re-use of open data sets in the context of Oman’s open government data (OGD) initiative.

Design/methodology/approach

Following a qualitative framework, the paper invoked a documentary analysis approach to probe the OGD initiative of Oman. Specifically, the national OGD portal of Oman (https://data.gov.om/) is being investigated in the paper. Furthermore, the paper invokes a theoretical model of “citizen engagement” (“Data over the wall”, “Code exchange”, “Civic issue tracker” and “Participatory open data model”) proposed by Sieber and Johnson (2015) to assess the extent to which open data sets may be re-used.

Findings

As per the theoretical model forwarded by Sieber and Johnson (2015), the OGD initiative of Oman is a cusp between “Data over the wall”, “Code exchange” and “Participatory” models. Oman’s OGD initiative facilitates the re-use of the open data sets. However, there are challenges in re-using the open data sets as well. The paper underlines the prospects of better re-use of data sets by institutionalizing the OGD initiative across all administrative levels of the country.

Practical implications

This study holds relevance for practitioners and policy-makers in Oman to ensure the re-use of data sets is facilitated for generating public value.

Originality/value

Hitherto, research has underlined the significance of launching OGD initiatives in the West but studies in developing countries are few. The present study seeks to plug this research gap by underlining the significance of OGD re-usage in Oman’s context.

Details

foresight, vol. 20 no. 2
Type: Research Article
ISSN: 1463-6689

Keywords

1 – 10 of 945